Search CORE

19 research outputs found

Topics in machine learning for biomedical literature analysis and text retrieval

Author: Islamaj Doğan Rezarta
Yeganova Lana
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

PubMed Central

Click-words: learning to predict document keywords from a user perspective

Author: Andrade
Aronson
Ciaramita
Dupret
Federiuk
Fuxman
Hawking
Hersh
Hulth
Islamaj Doğan
Ji
Jiang
Lacerda
Litvak
Liu
Liu
Liu
Lu
Manning
Matsuo
Rezarta Islamaj Doğan
Salton
Shen
Smith
Sohn
Tsai
Tsuruoka
Tudor
Yeganova
Yih
Zhang
Zhiyong Lu
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords

Crossref

PubMed Central

A context-blocks model for identifying clinical relationships in patient records

Author: A Névéol
A Roberts
AK McCallum
AM Cohen
AR Aronson
Aurélie Névéol
C Friedman
ES Chen
F Leitner
H Shatkay
H Xu
J Aberdeen
J Björne
J Lafferty
L Smith
L Tanabe
M Bundschus
M Craven
M Krallinger
N Ponomareva
O Uzuner
O Uzuner
R Harpaz
R Islamaj Doğan
R Islamaj Doğan
Rezarta Islamaj Doğan
SM Meystre
SV Pakhomov
TC Rindflesch
TC Rindflesch
X Wang
X Wang
X Wang
Zhiyong Lu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

BioC: a minimalist approach to interoperability for biomedical text processing

Author: Ciccarese Paolo
Cohen Kevin Bretonnel
Comeau Donald C.
Islamaj Doğan Rezarta
Krallinger Martin
Leitner Florian
Lu Zhiyong
Peng Yifan
Rinaldi Fabio
Torii Manabu
Valencia Alfonso
Verspoor Karin
Wiegers Thomas C.
Wilbur W. John
Wu Cathy H.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/03/2014
Field of study

A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net

Harvard University - DASH

Topics in machine learning for biomedical literature analysis and text retrieval

Author: Islamaj Doğan Rezarta
Yeganova Lana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2012
Field of study

Directory of Open Access Journals

BioC Implementations in Go, Perl, Python and Ruby

Author: Comeau Donald C
Doğan Rezarta Islamaj
Kwon Dongseop
Liu Wanli
Marques Hernani
Rinaldi Fabio
Wilbur W John
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net

PubMed Central

ZORA